Complexity vs. Load Factor (\(\alpha\))

Average probe cost grows with load. Different collision strategies behave differently as \(\alpha \to 1\).

Separate Chaining

Successful: \(1 + \frac{\alpha}{2}\)
Unsuccessful: \(\alpha\)

Linear Probing

Successful: \(\tfrac{1}{2}\big(1 + \tfrac{1}{1-\alpha}\big)\)
Unsuccessful: \(\tfrac{1}{2}\big(1 + \tfrac{1}{(1-\alpha)^2}\big)\)

Average Case: \(O(1)\)

Holds only while the load factor is kept below a policy threshold and the hash distributes well.

Worst Case: \(O(n)\)

Adversarial keys or a poor hash can force all items into few buckets / clusters.

Collision Strategies

  • Separate Chaining: Bucket holds list/tree. Expected length ≈ \(\alpha\).
  • Open Addressing: All keys in array; probe sequence on collision.
    • Linear probing: Simple, primary clustering.
    • Quadratic probing: Mitigates some clustering.
    • Double hashing: Better dispersion.
  • Robin Hood / Cuckoo / Hopscotch: Reduce variance & worst clusters.

Explore the Graph

Why Resizing Is Inevitable ⚠️🔁

  • 🚫 No perfect hash (general case): Finite table vs huge / unknown key space ⇒ collisions.
  • 🧩 Strategies alleviate, not eliminate: Chains / probe sequences still lengthen as \(\alpha\) rises.
  • 📈 Divergence: Linear probing terms with \(1/(1-\alpha)\) (and squared) explode near 1.
  • ⏱️ Degradation signal: Rising average probes or variance ⇒ approaching resize threshold.
  • Resize cost: One \(O(n)\) rehash → amortized still \(O(1)\).
  • 🔚 Bottom line: Good hash + collision handling delay slowdown, but sustained performance ultimately needs more buckets ✅.
Takeaway: Monitor load; when efficiency erodes, expand buckets to keep expected \(O(1)\) operations.